15-cs02-data

Professor Shannon Ellis

2/14/23

CS02: Vaping Behaviors in American Youth (Data)

Q&A

Q: So linear regression is just for numerical variables and logistic regression is just for a binary outcome? Can we only use one or the other depending on the data?
A: The model you use, be it linear regression, logistic regression, or something else is always driven by the data-generating process, the assumptions of the model, and the question asking. Specifically, for these two models, yes, the outcome variable guides the choice here. If the outcome is binary, linear regression won’t work given the fact that extrapolation beyond the two possible outcomes (meaning, you can get values other than TRUE/FALSE) will always be possible with linear regression. For logistic regression, it models a binary outcome, given the constraints specified by the model.

Q: I noticed the tremendous difference in the complexity/difficulty level between things introduced in lectures and the lab/HW assignments. I wonder if the expectation of the complexity level, for the case study, is similar to lab/HW assignments.
A: So, I’d love to chat more about this from anyone who has thoughts here b/c I’m always looking for the student perspective. Partially, this is by design. The main concepts are presented in lecture, lab gives you a low-stakes environment to deepend your understanding (since it’s graded on effort and there is an answer key provided), and then hw, now the third time you’ve seen/interacted with the material is where it’s the most “difficult” because you’ve already seen this material before. If I presented the most complex stuff in lecture then people would leave confused b/c they just learned the basics. That said, while the course is designed this way, the leaps are not intended to feel insurmountable, so I’d love to hear more from students about where particularly they’re struggling. That all said, I think of the case studies similar to HW. You’ve seen the material in lecture. You’ve interacted in lab. And, now, you’re working in groups on your third interaction with the material. Keep in mind that we do expect this to be the work of multiple group members. One group members should not be doing all the work.

Q: Will CS02 be with the same group or a different one?
A: They will be different. I’ll be asking about feedback on this policy at the end of the course. Last time we kept the same groups for all case studies (there were three last time) and final projects. Students requested different groups, so I tried that this quarter and will get feedback from you all on this!

Q: I am confused about some of the code provided in the boostrapping section.
A: The shortest explanation is we wanted to run the same model a whole bunch of times…but if we ran it on the same dataset, we’d get the same answer. Instead, we want to see how stable the model is by running the model with a slightly different set of observations each time. To do this, we remove one observation for each model. If the model is stable, removing a single data point should not change the coefficients much…but if by removing a single observation we get very different coefficient estimates, that suggests something is off with our data or model. So, we run the model on all of the subsets, with each subset being slightly different (by one observation) than the next. We store the model outputs. Then, we compare all of the results.

Course Announcements

Due Dates:

  • Lecture Participation survey “due” after class
  • Lab07 due tomorrow (3/3; 11:59 PM)
  • CS01 due Mon (3/6; 11:59 PM)
    • Repo w/ Rmd AND HTML
    • Survey completed on group work (individually; canvas for link)

Notes:

:: incremental - lab07 now available - Example case study posted - Final Project - instructions posted on website - final project group repos will be created tomorrow - CS02 Groups Discussion - No HW04 (full credit will be posted) ::

Agenda

  • Background
  • Data Import
  • Data Wrangling

Background

Previously known before report?

What e-cigarette vapors contain…

Use associated with lung injury

Source: Chand et al.

Questions

  1. How has tobacco and e-cigarette/vaping use by American youths changed since 2015?
  2. How does e-cigarette use compare between males and females?
  3. What vaping brands and flavors appear to be used the most frequently?
  4. Is there a relationship between e-cigarette/vaping use and other tobacco use?

Limitations

  1. The National Youth Tobacco Survey (NYTS) does not follow the same individual student respondents over time. A longitudinal study that does follow the same individuals over time collects data called panel data. The data in this study is called pooled cross-sectional data, and is obtained from random collection of observations across time.

  2. The data include percentages of student respondents reporting use of each particular tobacco product, but the survey questions did not ask the relative amount of use of one product compared to another. For example, the survey included questions like: “What flavors of tobacco products have you used in the past 30 days?” but did not ask how often one flavor was used by the same individual over another.

  3. While gender and sex are not actually binary, the data used in this analysis only contain information for groups of individuals who answered the survey questions as male or female.

The Data

The Data: Source

Data come from the National Youth Tobacco Survey (NYTS) - annual survey that asks students in high school and middle school (grades 6-12) about tobacco usage in the United States of America. - we’ll use data from 2015-2019

The Data: Format

  • One excel spreadsheet for each year
  • Corresponding codebook (explains what each variable stores)

The Data: Example

Codebook Example: Variables

Codebook Example: Details

Data Import

# only have to run this once 
# A good time for `eval=FALSE` in code chunk
OCSdata::load_simpler_import("ocs-bp-vaping-case-study", outpath = getwd())

👉 Your Turn: Load the data into RStudio.

  • The data have already been cleaned to only include columns of interest
  • will store 5 CSVs in data/simpler_import

Data Wrangling

# read in CSVs
nyts_data <- list.files("data/simpler_import/", 
                        pattern = "*.csv", 
                        full.names = TRUE) |>
  map(~ read_csv(.))

# get names
nyts_data_names <- list.files("data/simpler_import/",
                              pattern = "*.csv") |>
  str_extract("nyts201[5-9]")

# apply names
names(nyts_data) <- nyts_data_names

💡 How are the data stored after this code has executed?

Data Exploration

glimpse(nyts_data)
List of 5
 $ nyts2015: spc_tbl_ [17,711 × 29] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
  ..$ psu       : chr [1:17711] "015438" "015438" "015438" "015438" ...
  ..$ finwgt    : num [1:17711] 217 325 325 397 265 ...
  ..$ stratum   : chr [1:17711] "BR3" "BR3" "BR3" "BR3" ...
  ..$ Qn1       : num [1:17711] 10 9 10 10 10 10 10 10 10 10 ...
  ..$ Qn2       : num [1:17711] 2 1 1 1 2 2 1 2 1 2 ...
  ..$ Qn3       : num [1:17711] 7 7 7 7 7 7 7 7 7 7 ...
  ..$ ECIGT     : num [1:17711] 2 1 2 1 2 1 1 1 2 2 ...
  ..$ ECIGAR    : num [1:17711] 1 1 2 2 2 2 1 2 2 2 ...
  ..$ ESLT      : num [1:17711] 2 2 2 2 2 2 1 1 2 2 ...
  ..$ EELCIGT   : num [1:17711] 2 1 2 1 2 1 1 1 2 2 ...
  ..$ EROLLCIGTS: num [1:17711] 2 2 2 2 2 2 1 2 2 2 ...
  ..$ EFLAVCIGTS: num [1:17711] 2 2 2 1 2 2 1 2 2 2 ...
  ..$ EBIDIS    : num [1:17711] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ EFLAVCIGAR: num [1:17711] 2 1 2 2 2 2 1 2 2 2 ...
  ..$ EHOOKAH   : num [1:17711] 2 2 2 2 2 2 2 1 2 2 ...
  ..$ EPIPE     : num [1:17711] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ ESNUS     : num [1:17711] 2 2 2 2 2 2 1 2 2 2 ...
  ..$ EDISSOLV  : num [1:17711] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CCIGT     : num [1:17711] 2 1 2 2 2 2 2 2 2 2 ...
  ..$ CCIGAR    : num [1:17711] 2 1 2 2 2 2 2 2 2 2 ...
  ..$ CSLT      : num [1:17711] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CELCIGT   : num [1:17711] 2 2 2 1 2 2 2 2 2 2 ...
  ..$ CROLLCIGTS: num [1:17711] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CFLAVCIGTS: num [1:17711] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CBIDIS    : num [1:17711] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CHOOKAH   : num [1:17711] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CPIPE     : num [1:17711] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CSNUS     : num [1:17711] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CDISSOLV  : num [1:17711] 2 2 2 2 2 2 2 2 2 2 ...
  ..- attr(*, "spec")=
  .. .. cols(
  .. ..   psu = col_character(),
  .. ..   finwgt = col_double(),
  .. ..   stratum = col_character(),
  .. ..   Qn1 = col_double(),
  .. ..   Qn2 = col_double(),
  .. ..   Qn3 = col_double(),
  .. ..   ECIGT = col_double(),
  .. ..   ECIGAR = col_double(),
  .. ..   ESLT = col_double(),
  .. ..   EELCIGT = col_double(),
  .. ..   EROLLCIGTS = col_double(),
  .. ..   EFLAVCIGTS = col_double(),
  .. ..   EBIDIS = col_double(),
  .. ..   EFLAVCIGAR = col_double(),
  .. ..   EHOOKAH = col_double(),
  .. ..   EPIPE = col_double(),
  .. ..   ESNUS = col_double(),
  .. ..   EDISSOLV = col_double(),
  .. ..   CCIGT = col_double(),
  .. ..   CCIGAR = col_double(),
  .. ..   CSLT = col_double(),
  .. ..   CELCIGT = col_double(),
  .. ..   CROLLCIGTS = col_double(),
  .. ..   CFLAVCIGTS = col_double(),
  .. ..   CBIDIS = col_double(),
  .. ..   CHOOKAH = col_double(),
  .. ..   CPIPE = col_double(),
  .. ..   CSNUS = col_double(),
  .. ..   CDISSOLV = col_double()
  .. .. )
  ..- attr(*, "problems")=<externalptr> 
 $ nyts2016: spc_tbl_ [20,675 × 34] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
  ..$ psu       : chr [1:20675] "073102" "073102" "073102" "073102" ...
  ..$ finwgt    : num [1:20675] 2817 2817 2817 2817 3351 ...
  ..$ stratum   : chr [1:20675] "BR1" "BR1" "BR1" "BR1" ...
  ..$ Q1        : chr [1:20675] "08" "08" "07" "07" ...
  ..$ Q2        : chr [1:20675] "1" "1" "1" "1" ...
  ..$ Q3        : num [1:20675] 5 5 5 5 5 5 5 7 5 5 ...
  ..$ ECIGT     : num [1:20675] 2 1 2 2 2 1 2 2 NA 1 ...
  ..$ ECIGAR    : num [1:20675] 2 1 2 2 2 2 2 2 NA 2 ...
  ..$ ESLT      : num [1:20675] 2 2 2 2 2 2 2 1 1 2 ...
  ..$ EELCIGT   : num [1:20675] 2 1 2 2 2 1 2 2 NA 2 ...
  ..$ EHOOKAH   : num [1:20675] 2 1 2 2 2 2 2 2 NA 2 ...
  ..$ EROLLCIGTS: num [1:20675] 2 1 2 2 2 2 2 2 NA 2 ...
  ..$ EFLAVCIGAR: num [1:20675] 2 1 2 2 2 2 2 2 NA 2 ...
  ..$ EPIPE     : num [1:20675] 2 1 2 2 2 2 2 2 NA 2 ...
  ..$ ESNUS     : num [1:20675] 2 2 2 2 2 2 2 2 NA 2 ...
  ..$ EDISSOLV  : num [1:20675] 2 2 2 2 2 2 2 2 NA 2 ...
  ..$ EBIDIS    : num [1:20675] 2 2 2 2 2 2 2 2 NA 2 ...
  ..$ CCIGT     : num [1:20675] 2 1 2 2 2 1 2 2 NA 1 ...
  ..$ CCIGAR    : num [1:20675] 2 1 2 2 2 2 2 2 NA 2 ...
  ..$ CSLT      : num [1:20675] 2 2 2 2 2 2 2 1 1 2 ...
  ..$ CELCIGT   : num [1:20675] 2 2 2 2 2 2 2 2 NA 2 ...
  ..$ CHOOKAH   : num [1:20675] 2 2 2 2 2 2 2 2 1 2 ...
  ..$ CROLLCIGTS: num [1:20675] 2 1 2 2 2 2 2 2 NA 2 ...
  ..$ CPIPE     : num [1:20675] 2 2 2 2 2 2 2 2 NA 2 ...
  ..$ CSNUS     : num [1:20675] 2 2 2 2 2 2 2 2 NA 2 ...
  ..$ CDISSOLV  : num [1:20675] 2 2 2 2 2 2 2 2 NA 2 ...
  ..$ CBIDIS    : num [1:20675] 2 2 2 2 2 2 2 2 NA 2 ...
  ..$ Q50A      : num [1:20675] NA NA NA NA NA 1 NA NA NA NA ...
  ..$ Q50B      : num [1:20675] NA NA NA NA NA NA NA NA NA NA ...
  ..$ Q50C      : num [1:20675] NA 1 NA NA NA NA NA NA NA NA ...
  ..$ Q50D      : num [1:20675] NA NA NA NA NA NA NA NA NA NA ...
  ..$ Q50E      : num [1:20675] NA NA NA NA NA NA NA NA NA NA ...
  ..$ Q50F      : num [1:20675] NA 1 NA NA NA NA NA NA NA NA ...
  ..$ Q50G      : num [1:20675] NA 1 NA NA NA NA NA 1 NA NA ...
  ..- attr(*, "spec")=
  .. .. cols(
  .. ..   psu = col_character(),
  .. ..   finwgt = col_double(),
  .. ..   stratum = col_character(),
  .. ..   Q1 = col_character(),
  .. ..   Q2 = col_character(),
  .. ..   Q3 = col_double(),
  .. ..   ECIGT = col_double(),
  .. ..   ECIGAR = col_double(),
  .. ..   ESLT = col_double(),
  .. ..   EELCIGT = col_double(),
  .. ..   EHOOKAH = col_double(),
  .. ..   EROLLCIGTS = col_double(),
  .. ..   EFLAVCIGAR = col_double(),
  .. ..   EPIPE = col_double(),
  .. ..   ESNUS = col_double(),
  .. ..   EDISSOLV = col_double(),
  .. ..   EBIDIS = col_double(),
  .. ..   CCIGT = col_double(),
  .. ..   CCIGAR = col_double(),
  .. ..   CSLT = col_double(),
  .. ..   CELCIGT = col_double(),
  .. ..   CHOOKAH = col_double(),
  .. ..   CROLLCIGTS = col_double(),
  .. ..   CPIPE = col_double(),
  .. ..   CSNUS = col_double(),
  .. ..   CDISSOLV = col_double(),
  .. ..   CBIDIS = col_double(),
  .. ..   Q50A = col_double(),
  .. ..   Q50B = col_double(),
  .. ..   Q50C = col_double(),
  .. ..   Q50D = col_double(),
  .. ..   Q50E = col_double(),
  .. ..   Q50F = col_double(),
  .. ..   Q50G = col_double()
  .. .. )
  ..- attr(*, "problems")=<externalptr> 
 $ nyts2017: spc_tbl_ [17,872 × 33] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
  ..$ psu       : chr [1:17872] "600815" "600815" "600815" "600815" ...
  ..$ finwgt    : num [1:17872] 1234 1234 1234 1234 1234 ...
  ..$ stratum   : chr [1:17872] "HR1" "HR1" "HR1" "HR1" ...
  ..$ Q1        : chr [1:17872] "05" "04" "04" "04" ...
  ..$ Q2        : num [1:17872] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ Q3        : num [1:17872] 2 2 2 2 2 1 1 1 1 1 ...
  ..$ ECIGT     : num [1:17872] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ ECIGAR    : num [1:17872] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ ESLT      : num [1:17872] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ EELCIGT   : num [1:17872] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ EHOOKAH   : num [1:17872] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ EROLLCIGTS: num [1:17872] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ EPIPE     : num [1:17872] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ ESNUS     : num [1:17872] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ EDISSOLV  : num [1:17872] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ EBIDIS    : num [1:17872] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CCIGT     : num [1:17872] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CCIGAR    : num [1:17872] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CSLT      : num [1:17872] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CELCIGT   : num [1:17872] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CHOOKAH   : num [1:17872] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CROLLCIGTS: num [1:17872] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CPIPE     : num [1:17872] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CSNUS     : num [1:17872] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CDISSOLV  : num [1:17872] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CBIDIS    : num [1:17872] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ Q50A      : num [1:17872] NA NA NA NA NA NA NA NA NA NA ...
  ..$ Q50B      : num [1:17872] NA NA NA NA NA NA NA NA NA NA ...
  ..$ Q50C      : num [1:17872] NA NA NA NA NA NA NA NA NA NA ...
  ..$ Q50D      : num [1:17872] NA NA NA NA NA NA NA NA NA NA ...
  ..$ Q50E      : num [1:17872] NA NA NA NA NA NA NA NA NA NA ...
  ..$ Q50F      : num [1:17872] NA NA NA NA NA NA NA NA NA NA ...
  ..$ Q50G      : num [1:17872] NA NA NA NA NA NA NA NA NA NA ...
  ..- attr(*, "spec")=
  .. .. cols(
  .. ..   psu = col_character(),
  .. ..   finwgt = col_double(),
  .. ..   stratum = col_character(),
  .. ..   Q1 = col_character(),
  .. ..   Q2 = col_double(),
  .. ..   Q3 = col_double(),
  .. ..   ECIGT = col_double(),
  .. ..   ECIGAR = col_double(),
  .. ..   ESLT = col_double(),
  .. ..   EELCIGT = col_double(),
  .. ..   EHOOKAH = col_double(),
  .. ..   EROLLCIGTS = col_double(),
  .. ..   EPIPE = col_double(),
  .. ..   ESNUS = col_double(),
  .. ..   EDISSOLV = col_double(),
  .. ..   EBIDIS = col_double(),
  .. ..   CCIGT = col_double(),
  .. ..   CCIGAR = col_double(),
  .. ..   CSLT = col_double(),
  .. ..   CELCIGT = col_double(),
  .. ..   CHOOKAH = col_double(),
  .. ..   CROLLCIGTS = col_double(),
  .. ..   CPIPE = col_double(),
  .. ..   CSNUS = col_double(),
  .. ..   CDISSOLV = col_double(),
  .. ..   CBIDIS = col_double(),
  .. ..   Q50A = col_double(),
  .. ..   Q50B = col_double(),
  .. ..   Q50C = col_double(),
  .. ..   Q50D = col_double(),
  .. ..   Q50E = col_double(),
  .. ..   Q50F = col_double(),
  .. ..   Q50G = col_double()
  .. .. )
  ..- attr(*, "problems")=<externalptr> 
 $ nyts2018: spc_tbl_ [20,189 × 33] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
  ..$ psu       : chr [1:20189] "015659" "015659" "015659" "015659" ...
  ..$ finwgt    : num [1:20189] 751 862 862 862 899 ...
  ..$ stratum   : chr [1:20189] "BR3" "BR3" "BR3" "BR3" ...
  ..$ Q1        : chr [1:20189] "04" "04" "05" "04" ...
  ..$ Q2        : chr [1:20189] "2" "2" "2" "2" ...
  ..$ Q3        : chr [1:20189] "1" "2" "2" "2" ...
  ..$ ECIGT     : num [1:20189] 2 2 2 2 2 1 2 1 2 2 ...
  ..$ ECIGAR    : num [1:20189] 2 2 2 2 1 NA 2 2 2 2 ...
  ..$ ESLT      : num [1:20189] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ EELCIGT   : num [1:20189] 2 2 2 2 2 1 1 2 2 2 ...
  ..$ EHOOKAH   : num [1:20189] 2 2 2 NA 2 1 2 2 2 2 ...
  ..$ EROLLCIGTS: num [1:20189] 2 2 2 2 2 2 2 1 2 2 ...
  ..$ EPIPE     : num [1:20189] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ ESNUS     : num [1:20189] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ EDISSOLV  : num [1:20189] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ EBIDIS    : num [1:20189] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CCIGT     : num [1:20189] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CCIGAR    : num [1:20189] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CSLT      : num [1:20189] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CELCIGT   : num [1:20189] 2 2 2 2 2 NA 2 2 2 2 ...
  ..$ CHOOKAH   : num [1:20189] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CROLLCIGTS: num [1:20189] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CPIPE     : num [1:20189] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CSNUS     : num [1:20189] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CDISSOLV  : num [1:20189] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ CBIDIS    : num [1:20189] 2 2 2 2 2 2 2 2 2 2 ...
  ..$ Q50A      : num [1:20189] NA NA NA NA NA 1 NA NA NA NA ...
  ..$ Q50B      : num [1:20189] NA NA NA NA NA NA NA NA NA NA ...
  ..$ Q50C      : num [1:20189] NA NA NA NA NA 1 NA NA NA NA ...
  ..$ Q50D      : num [1:20189] NA NA NA NA NA NA NA NA NA NA ...
  ..$ Q50E      : num [1:20189] NA NA NA NA NA NA NA NA NA NA ...
  ..$ Q50F      : num [1:20189] NA NA NA NA NA 1 NA NA NA NA ...
  ..$ Q50G      : num [1:20189] NA NA NA NA NA NA NA NA NA NA ...
  ..- attr(*, "spec")=
  .. .. cols(
  .. ..   psu = col_character(),
  .. ..   finwgt = col_double(),
  .. ..   stratum = col_character(),
  .. ..   Q1 = col_character(),
  .. ..   Q2 = col_character(),
  .. ..   Q3 = col_character(),
  .. ..   ECIGT = col_double(),
  .. ..   ECIGAR = col_double(),
  .. ..   ESLT = col_double(),
  .. ..   EELCIGT = col_double(),
  .. ..   EHOOKAH = col_double(),
  .. ..   EROLLCIGTS = col_double(),
  .. ..   EPIPE = col_double(),
  .. ..   ESNUS = col_double(),
  .. ..   EDISSOLV = col_double(),
  .. ..   EBIDIS = col_double(),
  .. ..   CCIGT = col_double(),
  .. ..   CCIGAR = col_double(),
  .. ..   CSLT = col_double(),
  .. ..   CELCIGT = col_double(),
  .. ..   CHOOKAH = col_double(),
  .. ..   CROLLCIGTS = col_double(),
  .. ..   CPIPE = col_double(),
  .. ..   CSNUS = col_double(),
  .. ..   CDISSOLV = col_double(),
  .. ..   CBIDIS = col_double(),
  .. ..   Q50A = col_double(),
  .. ..   Q50B = col_double(),
  .. ..   Q50C = col_double(),
  .. ..   Q50D = col_double(),
  .. ..   Q50E = col_double(),
  .. ..   Q50F = col_double(),
  .. ..   Q50G = col_double()
  .. .. )
  ..- attr(*, "problems")=<externalptr> 
 $ nyts2019: spc_tbl_ [19,018 × 36] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
  ..$ psu       : num [1:19018] 58123 58123 58123 58123 58123 ...
  ..$ finwgt    : num [1:19018] 159 151 151 151 221 ...
  ..$ stratum   : chr [1:19018] "HR4" "HR4" "HR4" "HR4" ...
  ..$ Q1        : chr [1:19018] "7" "8" "6" "6" ...
  ..$ Q2        : chr [1:19018] "2" "1" "1" "1" ...
  ..$ Q3        : chr [1:19018] "4" "4" "4" "4" ...
  ..$ ECIGT     : chr [1:19018] "2" "2" "2" "2" ...
  ..$ ECIGAR    : chr [1:19018] "2" "2" "2" "2" ...
  ..$ ESLT      : chr [1:19018] "2" "2" "2" "2" ...
  ..$ EELCIGT   : chr [1:19018] "2" "2" "2" "2" ...
  ..$ EHOOKAH   : chr [1:19018] "2" "2" "2" "2" ...
  ..$ EROLLCIGTS: chr [1:19018] "2" "2" "2" "2" ...
  ..$ EPIPE     : chr [1:19018] "2" "2" "2" "2" ...
  ..$ ESNUS     : chr [1:19018] "2" "2" "2" "2" ...
  ..$ EDISSOLV  : chr [1:19018] "2" "2" "2" "2" ...
  ..$ EBIDIS    : chr [1:19018] "2" "2" "2" "2" ...
  ..$ EHTP      : chr [1:19018] "2" "2" "2" "2" ...
  ..$ CCIGT     : chr [1:19018] "2" "2" "2" "2" ...
  ..$ CCIGAR    : chr [1:19018] "2" "2" "2" "2" ...
  ..$ CSLT      : chr [1:19018] "2" "2" "2" "2" ...
  ..$ CELCIGT   : chr [1:19018] "2" "2" "2" "2" ...
  ..$ CHOOKAH   : chr [1:19018] "2" "2" "2" "2" ...
  ..$ CROLLCIGTS: chr [1:19018] "2" "2" "2" "2" ...
  ..$ CPIPE     : chr [1:19018] "2" "2" "2" "2" ...
  ..$ CSNUS     : chr [1:19018] "2" "2" "2" "2" ...
  ..$ CDISSOLV  : chr [1:19018] "2" "2" "2" "2" ...
  ..$ CBIDIS    : chr [1:19018] "2" "2" "2" "2" ...
  ..$ CHTP      : chr [1:19018] "2" "2" "2" "2" ...
  ..$ Q40       : chr [1:19018] ".S" ".S" ".S" ".S" ...
  ..$ Q62A      : chr [1:19018] ".S" ".S" ".S" ".S" ...
  ..$ Q62B      : chr [1:19018] ".S" ".S" ".S" ".S" ...
  ..$ Q62C      : chr [1:19018] ".S" ".S" ".S" ".S" ...
  ..$ Q62D      : chr [1:19018] ".S" ".S" ".S" ".S" ...
  ..$ Q62E      : chr [1:19018] ".S" ".S" ".S" ".S" ...
  ..$ Q62F      : chr [1:19018] ".S" ".S" ".S" ".S" ...
  ..$ Q62G      : chr [1:19018] ".S" ".S" ".S" ".S" ...
  ..- attr(*, "spec")=
  .. .. cols(
  .. ..   psu = col_double(),
  .. ..   finwgt = col_double(),
  .. ..   stratum = col_character(),
  .. ..   Q1 = col_character(),
  .. ..   Q2 = col_character(),
  .. ..   Q3 = col_character(),
  .. ..   ECIGT = col_character(),
  .. ..   ECIGAR = col_character(),
  .. ..   ESLT = col_character(),
  .. ..   EELCIGT = col_character(),
  .. ..   EHOOKAH = col_character(),
  .. ..   EROLLCIGTS = col_character(),
  .. ..   EPIPE = col_character(),
  .. ..   ESNUS = col_character(),
  .. ..   EDISSOLV = col_character(),
  .. ..   EBIDIS = col_character(),
  .. ..   EHTP = col_character(),
  .. ..   CCIGT = col_character(),
  .. ..   CCIGAR = col_character(),
  .. ..   CSLT = col_character(),
  .. ..   CELCIGT = col_character(),
  .. ..   CHOOKAH = col_character(),
  .. ..   CROLLCIGTS = col_character(),
  .. ..   CPIPE = col_character(),
  .. ..   CSNUS = col_character(),
  .. ..   CDISSOLV = col_character(),
  .. ..   CBIDIS = col_character(),
  .. ..   CHTP = col_character(),
  .. ..   Q40 = col_character(),
  .. ..   Q62A = col_character(),
  .. ..   Q62B = col_character(),
  .. ..   Q62C = col_character(),
  .. ..   Q62D = col_character(),
  .. ..   Q62E = col_character(),
  .. ..   Q62F = col_character(),
  .. ..   Q62G = col_character()
  .. .. )
  ..- attr(*, "problems")=<externalptr> 

Data Cleaning (Variable Names)

nyts_data[["nyts2015"]] <- nyts_data[["nyts2015"]] |>
  rename(Age = Qn1,
         Sex = Qn2,
         Grade = Qn3)
update_survey <- function(dataset) { 
  dataset |>
    rename(Age = Q1,
           Sex = Q2,
           Grade = Q3,
           menthol = Q50A,
           clove_spice = Q50B,
           fruit = Q50C,
           chocolate = Q50D,
           alcoholic_drink = Q50E,
           candy_dessert_sweets = Q50F,
           other = Q50G)
}
nyts_data <- nyts_data |> 
  map_at(c("nyts2016", "nyts2017", "nyts2018"), update_survey)

💡 Your Turn: Why are we only applying this function for three of the years?

Note: some of the 2019 questions use the values “.N”, “.M”, “.S”, and “.Z” to indicate different types of missing data -> turn into NAs

nyts_data[["nyts2019"]] <- nyts_data[["nyts2019"]] |>
  rename(brand_ecig = Q40,
         Age = Q1,
         Sex = Q2,
         Grade = Q3,
         menthol = Q62A,
         clove_spice = Q62B,
         fruit = Q62C,
         chocolate = Q62D,
         alcoholic_drink = Q62E,
         candy_dessert_sweets = Q62F,
         other = Q62G) |>
  mutate_all(~ replace(., . %in% c(".N", ".S", ".Z", ".M"), NA))
map(nyts_data, names)
$nyts2015
 [1] "psu"        "finwgt"     "stratum"    "Age"        "Sex"       
 [6] "Grade"      "ECIGT"      "ECIGAR"     "ESLT"       "EELCIGT"   
[11] "EROLLCIGTS" "EFLAVCIGTS" "EBIDIS"     "EFLAVCIGAR" "EHOOKAH"   
[16] "EPIPE"      "ESNUS"      "EDISSOLV"   "CCIGT"      "CCIGAR"    
[21] "CSLT"       "CELCIGT"    "CROLLCIGTS" "CFLAVCIGTS" "CBIDIS"    
[26] "CHOOKAH"    "CPIPE"      "CSNUS"      "CDISSOLV"  

$nyts2016
 [1] "psu"                  "finwgt"               "stratum"             
 [4] "Age"                  "Sex"                  "Grade"               
 [7] "ECIGT"                "ECIGAR"               "ESLT"                
[10] "EELCIGT"              "EHOOKAH"              "EROLLCIGTS"          
[13] "EFLAVCIGAR"           "EPIPE"                "ESNUS"               
[16] "EDISSOLV"             "EBIDIS"               "CCIGT"               
[19] "CCIGAR"               "CSLT"                 "CELCIGT"             
[22] "CHOOKAH"              "CROLLCIGTS"           "CPIPE"               
[25] "CSNUS"                "CDISSOLV"             "CBIDIS"              
[28] "menthol"              "clove_spice"          "fruit"               
[31] "chocolate"            "alcoholic_drink"      "candy_dessert_sweets"
[34] "other"               

$nyts2017
 [1] "psu"                  "finwgt"               "stratum"             
 [4] "Age"                  "Sex"                  "Grade"               
 [7] "ECIGT"                "ECIGAR"               "ESLT"                
[10] "EELCIGT"              "EHOOKAH"              "EROLLCIGTS"          
[13] "EPIPE"                "ESNUS"                "EDISSOLV"            
[16] "EBIDIS"               "CCIGT"                "CCIGAR"              
[19] "CSLT"                 "CELCIGT"              "CHOOKAH"             
[22] "CROLLCIGTS"           "CPIPE"                "CSNUS"               
[25] "CDISSOLV"             "CBIDIS"               "menthol"             
[28] "clove_spice"          "fruit"                "chocolate"           
[31] "alcoholic_drink"      "candy_dessert_sweets" "other"               

$nyts2018
 [1] "psu"                  "finwgt"               "stratum"             
 [4] "Age"                  "Sex"                  "Grade"               
 [7] "ECIGT"                "ECIGAR"               "ESLT"                
[10] "EELCIGT"              "EHOOKAH"              "EROLLCIGTS"          
[13] "EPIPE"                "ESNUS"                "EDISSOLV"            
[16] "EBIDIS"               "CCIGT"                "CCIGAR"              
[19] "CSLT"                 "CELCIGT"              "CHOOKAH"             
[22] "CROLLCIGTS"           "CPIPE"                "CSNUS"               
[25] "CDISSOLV"             "CBIDIS"               "menthol"             
[28] "clove_spice"          "fruit"                "chocolate"           
[31] "alcoholic_drink"      "candy_dessert_sweets" "other"               

$nyts2019
 [1] "psu"                  "finwgt"               "stratum"             
 [4] "Age"                  "Sex"                  "Grade"               
 [7] "ECIGT"                "ECIGAR"               "ESLT"                
[10] "EELCIGT"              "EHOOKAH"              "EROLLCIGTS"          
[13] "EPIPE"                "ESNUS"                "EDISSOLV"            
[16] "EBIDIS"               "EHTP"                 "CCIGT"               
[19] "CCIGAR"               "CSLT"                 "CELCIGT"             
[22] "CHOOKAH"              "CROLLCIGTS"           "CPIPE"               
[25] "CSNUS"                "CDISSOLV"             "CBIDIS"              
[28] "CHTP"                 "brand_ecig"           "menthol"             
[31] "clove_spice"          "fruit"                "chocolate"           
[34] "alcoholic_drink"      "candy_dessert_sweets" "other"               

Data Cleaning (Variable Values)

Values correspond to a category:

  • Age Value 1 == 9 years old
  • Grade Value 1 == 6th grade)

Data Cleaning (Variable Values)

update_values <- function(dataset){
  dataset |>
    mutate_all(~ replace(., . %in% c("*", "**"), NA)) |>
    mutate(Age = as.numeric(Age) + 8,
           Grade = as.numeric(Grade) + 5) |>
    mutate(Age = as.factor(Age),
           Grade = as.factor(Grade),
           Sex = as.factor(Sex)) |>
    mutate(Sex = case_match(Sex,
                            "1" ~ "male",
                            "2" ~ "female")) |>
    mutate_all(~ replace(., . %in% c("*", "**"), NA)) |>
    mutate(Age = case_match(Age, "19" ~ ">18"),
           Grade = case_match(Grade,
                              "13" ~ "Ungraded/Other")) |>
    mutate_at(vars(starts_with("E", ignore.case = FALSE),
                   starts_with("C", ignore.case = FALSE)
    ), list( ~ recode(., `1` = TRUE,
                         `2`  = FALSE,
                         .default = NA,
                         .missing = NA)))
}

🧠 Your Turn: Explain what at least one function in here is doing?

nyts_data <- map(nyts_data, update_values)

# function to count how many males
count_sex <- function(dataset){dataset |> 
    filter(Sex=='male') |> 
    count(Sex) |> 
    pull(n)}
nyts_data[["nyts2019"]] <- nyts_data[["nyts2019"]]  |>
  mutate(psu = as.character(psu)) |>
  mutate(brand_ecig = recode(brand_ecig,
                             `1` = "Other", # levels 1,8 combined to `Other`
                             `2` = "Blu",
                             `3` = "JUUL",
                             `4` = "Logic",
                             `5` = "MarkTen",
                             `6` = "NJOY",
                             `7` = "Vuse",
                             `8` = "Other"))

According to the codebook, we should have:

  1. 8,958 males in 2015
  2. 10,438 males in 2016
  3. 8,881 males in 2017
  4. 10,069 males in 2018
  5. 9,803 males in 2019 ]
# count how many males are in our dataset
map(nyts_data, count_sex)
$nyts2015
[1] 8958

$nyts2016
[1] 10438

$nyts2017
[1] 8881

$nyts2018
[1] 10069

$nyts2019
[1] 9803

Flavor Data (2016-2019)

  • setting missing values to FALSE, then…
  • the TRUE values will represent those who reported using a specific flavor out of all users (rather than those that used a specific flavor compared to those who used a different flavor.)
update_flavors <- function(dataset){
  dataset |>
    mutate_at(vars(menthol:other),
              list(~ recode(.,
                            `1` = TRUE,
                            .default = FALSE,
                            .missing = FALSE))) }

nyts_data  <- nyts_data  |> 
  map_at(vars(-nyts2015), update_flavors)

Combine the data!

nyts_data <- nyts_data |>
  map_df(bind_rows, .id = "year") |>
  mutate(year = as.numeric(str_remove(year, "nyts")))

Your Turn: What does this code do?

The Data

glimpse(nyts_data)
Rows: 95,465
Columns: 40
$ year                 <dbl> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2…
$ psu                  <chr> "015438", "015438", "015438", "015438", "015438",…
$ finwgt               <dbl> 216.7268, 324.9620, 324.9620, 397.1552, 264.8745,…
$ stratum              <chr> "BR3", "BR3", "BR3", "BR3", "BR3", "BR3", "BR3", …
$ Age                  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ Sex                  <chr> "female", "male", "male", "male", "female", "fema…
$ Grade                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ ECIGT                <lgl> FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE…
$ ECIGAR               <lgl> TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, FAL…
$ ESLT                 <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, T…
$ EELCIGT              <lgl> FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE…
$ EROLLCIGTS           <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, F…
$ EFLAVCIGTS           <lgl> FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FA…
$ EBIDIS               <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
$ EFLAVCIGAR           <lgl> FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, FA…
$ EHOOKAH              <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
$ EPIPE                <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
$ ESNUS                <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, F…
$ EDISSOLV             <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
$ CCIGT                <lgl> FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, F…
$ CCIGAR               <lgl> FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, F…
$ CSLT                 <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
$ CELCIGT              <lgl> FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, F…
$ CROLLCIGTS           <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
$ CFLAVCIGTS           <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
$ CBIDIS               <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
$ CHOOKAH              <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
$ CPIPE                <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
$ CSNUS                <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
$ CDISSOLV             <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
$ menthol              <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ clove_spice          <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ fruit                <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ chocolate            <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ alcoholic_drink      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ candy_dessert_sweets <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ other                <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ EHTP                 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ CHTP                 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ brand_ecig           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…

Current vs. ever users

We define these two groups as follows:

  1. current = students who used a product for >=1 day in the past 30 days
  2. ever = students who report having used or tried a product at any point in time

All current users are therefore ever users but not all ever users are current users. Thus, current users are a subset of ever users.

What this looks like in the data..

  • EPIPE: Students who reported they have smoked tobacco from a pipe (not hookah).
  • CPIPE: Students who reported they smoked tobacco in a pipe (not hookah) during the past 30 days.
  • EROLLCIGTS: Students who reported they have tried smoking roll-your-own cigarettes.
  • CROLLCIGTS: Students who reported they smoked roll-your-own cigarettes during the past 30 days.

Clean up columns: tobacco

nyts_data <- nyts_data %>%
  mutate(tobacco_sum_ever = rowSums(select(., starts_with("E", 
                                    ignore.case = FALSE)), na.rm = TRUE),
         tobacco_sum_current = rowSums(select(., starts_with("C", 
                                    ignore.case = FALSE)), na.rm = TRUE))  |>
  mutate(tobacco_ever = case_when(tobacco_sum_ever > 0 ~ TRUE,
                                  tobacco_sum_ever == 0 ~ FALSE),
         tobacco_current = case_when(tobacco_sum_current > 0 ~ TRUE,
                                     tobacco_sum_current == 0 ~ FALSE))

Your Turn: What does this code do?

Clean up columns: e-cig/vaping vs others

nyts_data <- nyts_data %>% 
  mutate(ecig_sum_ever = rowSums(select(., EELCIGT), na.rm = TRUE),
         ecig_sum_current = rowSums(select(., CELCIGT), na.rm = TRUE),
         non_ecig_sum_ever = rowSums(select(., starts_with("E",  ignore.case = FALSE), 
                                            -EELCIGT), na.rm = TRUE),
         non_ecig_sum_current = rowSums(select(., starts_with("C", ignore.case = FALSE), 
                                               -CELCIGT), na.rm = TRUE)) |>
  mutate(ecig_ever = case_when(ecig_sum_ever > 0 ~ TRUE,
                              ecig_sum_ever == 0 ~ FALSE),
         ecig_current = case_when(ecig_sum_current > 0 ~ TRUE,
                                  ecig_sum_current == 0 ~ FALSE),
         non_ecig_ever = case_when(non_ecig_sum_ever > 0 ~ TRUE,
                                   non_ecig_sum_ever == 0 ~ FALSE),
         non_ecig_current = case_when(non_ecig_sum_current > 0 ~ TRUE,
                                      non_ecig_sum_current == 0 ~ FALSE))

Specify use group

nyts_data <- nyts_data |>
             mutate(ecig_only_ever = case_when(ecig_ever == TRUE &
                                           non_ecig_ever == FALSE &
                                            ecig_current == FALSE &
                                        non_ecig_current == FALSE ~ TRUE,
                                                             TRUE ~ FALSE),
              ecig_only_current = case_when(ecig_current == TRUE &
                                           non_ecig_ever == FALSE &
                                        non_ecig_current == FALSE ~ TRUE,
                                                            TRUE ~ FALSE),
            non_ecig_only_ever = case_when(non_ecig_ever == TRUE &
                                               ecig_ever == FALSE &
                                            ecig_current == FALSE &
                                        non_ecig_current == FALSE ~ TRUE,
                                                            TRUE ~ FALSE),
      non_ecig_only_current = case_when(non_ecig_current == TRUE &
                                               ecig_ever == FALSE &
                                            ecig_current == FALSE ~ TRUE,
                                                            TRUE ~ FALSE),
                        no_use = case_when(non_ecig_ever == FALSE &
                                               ecig_ever == FALSE &
                                            ecig_current == FALSE &
                                        non_ecig_current == FALSE ~ TRUE,
                                                            TRUE ~ FALSE)) %>%
                 mutate(Group = case_when(ecig_only_ever == TRUE |
                                       ecig_only_current == TRUE ~ "Only e-cigarettes",
                                      non_ecig_only_ever == TRUE |
                                   non_ecig_only_current == TRUE ~ "Only other products",
                                                  no_use == TRUE ~ "Neither",
                                          ecig_only_ever == FALSE &
                                       ecig_only_current == FALSE &
                                      non_ecig_only_ever == FALSE &
                                   non_ecig_only_current == FALSE &
                                                  no_use == FALSE ~ "Combination of products"))

Add yearly survey totals

nyts_data <- nyts_data |> 
  add_count(year)

The Data

glimpse(nyts_data)
Rows: 95,465
Columns: 59
$ year                  <dbl> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, …
$ psu                   <chr> "015438", "015438", "015438", "015438", "015438"…
$ finwgt                <dbl> 216.7268, 324.9620, 324.9620, 397.1552, 264.8745…
$ stratum               <chr> "BR3", "BR3", "BR3", "BR3", "BR3", "BR3", "BR3",…
$ Age                   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ Sex                   <chr> "female", "male", "male", "male", "female", "fem…
$ Grade                 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ ECIGT                 <lgl> FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRU…
$ ECIGAR                <lgl> TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, FA…
$ ESLT                  <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, …
$ EELCIGT               <lgl> FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRU…
$ EROLLCIGTS            <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, …
$ EFLAVCIGTS            <lgl> FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, F…
$ EBIDIS                <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,…
$ EFLAVCIGAR            <lgl> FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, F…
$ EHOOKAH               <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,…
$ EPIPE                 <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,…
$ ESNUS                 <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, …
$ EDISSOLV              <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,…
$ CCIGT                 <lgl> FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, …
$ CCIGAR                <lgl> FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, …
$ CSLT                  <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,…
$ CELCIGT               <lgl> FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, …
$ CROLLCIGTS            <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,…
$ CFLAVCIGTS            <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,…
$ CBIDIS                <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,…
$ CHOOKAH               <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,…
$ CPIPE                 <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,…
$ CSNUS                 <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,…
$ CDISSOLV              <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,…
$ menthol               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ clove_spice           <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ fruit                 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ chocolate             <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ alcoholic_drink       <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ candy_dessert_sweets  <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ other                 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ EHTP                  <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ CHTP                  <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ brand_ecig            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ tobacco_sum_ever      <dbl> 1, 4, 0, 3, 0, 2, 8, 4, 0, 0, 0, 1, 1, 0, 0, 4, …
$ tobacco_sum_current   <dbl> 0, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ tobacco_ever          <lgl> TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE…
$ tobacco_current       <lgl> FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, F…
$ ecig_sum_ever         <dbl> 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, …
$ ecig_sum_current      <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ non_ecig_sum_ever     <dbl> 1, 3, 0, 2, 0, 1, 7, 3, 0, 0, 0, 0, 1, 0, 0, 3, …
$ non_ecig_sum_current  <dbl> 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ ecig_ever             <lgl> FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRU…
$ ecig_current          <lgl> FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, …
$ non_ecig_ever         <lgl> TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE…
$ non_ecig_current      <lgl> FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, …
$ ecig_only_ever        <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,…
$ ecig_only_current     <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,…
$ non_ecig_only_ever    <lgl> TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
$ non_ecig_only_current <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,…
$ no_use                <lgl> FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, F…
$ Group                 <chr> "Only other products", "Combination of products"…
$ n                     <int> 17711, 17711, 17711, 17711, 17711, 17711, 17711,…

Save the Data

save(nyts_data, file="data/wrangled/wrangled_data_vaping.rda")

Suggested Reading